- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 99
Uncaught socket exception during timeout handling #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| The windows test suite failure has been happening on master for the last ten days as well. But I'm not sure yet about the ci/circleci: macos-build and continuous-integration/appveyor/pr failures. | 
| @ajyoung there's a few TLS tests that are flaky on slow machines like macOS and Windows. If they happen on master, ignore those. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajyoung thanks for the PR! The fix looks correct but this patch also needs a regression test so that it's checked in the CI and is not only confirmed working in your specific env.
        
          
                cheroot/server.py
              
                Outdated
          
        
      |  | ||
| try: | ||
| req.simple_response(response) | ||
| except socket.error: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would need a long comment explaining why this is necessary.
| This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. | 
| @ajyoung any updates? | 
| @webknjaz -- Sorry for the delay. I haven't forgotten about it but I need to the get the environment setup again so I can run and write the required tests. I'll try to get back to this before Thanksgiving in the US (November 27) | 
0500eb0    to
    d2fde4e      
    Compare
  
    | @webknjaz -- It looks like the test_wsgi tests will no longer run under python 2.7 due to warnings like "CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release" Do you know a way to suppress this warning when running the tests under python 2.7? | 
| It's already suppressed on master and doesn't prevent anything from being executed in this PR. You may be looking at some old CI runs. Your test fails because you attempt using socket as a CM while it doesn't have such interface. | 
| @webknjaz -- I fixed the python 2.7 syntax. The tests pass here on linux but one concern is that I tried the test on the master branch without the fix and the test passes there as well. It seems that either the attempt to set the thread count to 1 or the attempt to set the timeout to 1s is incorrect in the test (so the server isn't actually dropping all threads with just two client calls). I can verify that when I run the master branch manually and the 210 branch manually, that I can reproduce the issue with the same python client code. I run the code below eleven times against the server running on the master branch and it drops all threads. But that doesn't seem to be happening in the automated test. [root@host-4 ~]# cat sslperftest-simple-client-py2.py
#!/usr/bin/python2
import socket
import time
socket_type = socket.AF_INET
s = socket.socket(socket_type, socket.SOCK_STREAM)
s.connect(('host-4.example.com', 11827))
time.sleep(12)
s.close() | 
| @ajyoung that looks like a half of a reproducer, it'd be useful to see the exact server part of the repro too. | 
| @webknjaz -- The server reproduction is here. [root@host-4 ~]vi sslperftest-simple-pyopenssl.py py
#!/usr/bin/python2
certfile='/tmp/ssl-cert.pem'
keyfile='/tmp/ssl-cert-pk.pem'
cafile=None
ciphers=None
host='host-4.example.com'
port=11827
def raw_wsgi_app(environ, start_response):
    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status, response_headers)
    return ['Hello world!']
from cheroot.ssl.pyopenssl import pyOpenSSLAdapter
from cheroot import wsgi
bind_addr = (host, port)
server = wsgi.Server(bind_addr, raw_wsgi_app, request_queue_size=32)
server.ssl_adapter = pyOpenSSLAdapter(certfile, keyfile, certificate_chain=cafile, ciphers=ciphers)
try:
    server.start()
except KeyboardInterrupt:
    server.stop() | 
        
          
                cheroot/server.py
              
                Outdated
          
        
      |  | ||
| try: | ||
| req.simple_response(response) | ||
| except socket.error: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajyoung After giving it some thought, I think that this a wrong fix. It relies on the leaking implementation detail.
Instead, we need to change the pyopenssl adapter code to raise a standardized exception defined in https://github.com/cherrypy/cheroot/blob/master/cheroot/errors.py, like in other except-blocks. This would abstract us from the underlying causes. The exception should probably inherit https://docs.python.org/3/library/exceptions.html#ConnectionError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webknjaz -- I suppose in that case, we'd want to just add the errnum to errors.socket_errors_to_ignore and not throw an exception at all. But in this case, the errnum is -1. It cannot be translated to a specific error. Do we want to handle all -1 errors this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR We need to have our own exception declared for communicating this.
-1 is not at error code, it's just how OpenSSL signals that there's something wrong AFAIU. IIUC OpenSSL is implemented as a state machine and to figure out what error state it is in, one should call SSL_get_error(). For example, here's what PyOpenSSL does: https://github.com/pyca/pyopenssl/blob/52341e8/src/OpenSSL/SSL.py#L1530.
Not throwing an exception is a no-go. It'll probably break some of the mechanisms. Also, exceptions is how we communicate problems while returning some values mean that there's no problems at all which is not at all true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webknjaz -- I've looked at this more closely now. At the point highlighted above where the socket.error occurs, it is already clear that the underlying error is an SSL.SysCallError. At the point of execution highlighted in server.py above, the ssl context isn't available but the stack trace makes this clear:
[root@host-4 ~]# /opt/cloudera/parcels/KEYTRUSTEE_SERVER/bin/python sslperftest-simple-pyopenssl.py
Exception in thread CP Server Thread-1:
Traceback (most recent call last):
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/workers/threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1290, in communicate
    self._conditional_error(req, '408 Request Timeout')
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1339, in _conditional_error
    req.simple_response(response)
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/server.py", line 1111, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/makefile.py", line 78, in write
    data_mv[bytes_sent:bytes_sent + SOCK_WRITE_BLOCKSIZE],
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 163, in send
    *args, **kwargs
  File "/opt/cloudera/parcels/KEYTRUSTEE_SERVER/lib/python2.7/site-packages/cheroot/ssl/pyopenssl.py", line 110, in _safe_call
    raise socket.error(errnum)
error: -1There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we need to raise a better error there. How about FatalSSLAlert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a new commit that makes the fix by raising FatalSSLAlert in cheroot/ssl/pyopenssl.py instead of ignoring the socket.error in cheroot/server.py.
| @ajyoung looks like this failing test is related to the changes in this PR: https://github.com/cherrypy/cheroot/pull/310/checks?check_run_id=1397448854#step:14:682 | 
        
          
                cheroot/test/test_ssl.py
              
                Outdated
          
        
      | tlshttpserver = \ | ||
| tls_single_thread_http_server((interface, port), tls_adapter) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Escaping EOL is harmful, wrap with parentheses instead
| tlshttpserver = \ | |
| tls_single_thread_http_server((interface, port), tls_adapter) | |
| tlshttpserver = ( | |
| tls_single_thread_http_server((interface, port), tls_adapter) | |
| ) | 
        
          
                cheroot/test/test_ssl.py
              
                Outdated
          
        
      | s = socket.socket(socket_type, socket.SOCK_STREAM) | ||
| s.connect((ip_addr, port)) | ||
| time.sleep(3) | ||
| s.close() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe close this socket at the end of the test function?
        
          
                cheroot/test/test_ssl.py
              
                Outdated
          
        
      | pytest.param(ANY_INTERFACE_IPV6, marks=missing_ipv6), | ||
| ), | ||
| ) | ||
| def test_server_timeout_before_content_error( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't have any asserts: I'd expect it to use something like capfd fixture and check what server outputs.
| Just an update on this -- I will get back to it after Christmas, just before the new year. | 
158239e    to
    ddee97e      
    Compare
  
    A DoS would happen in many situations, including TLS errors and attempts to close the underlying sockets erroring out. This patch aims to prevent a situation when the worker threads are killed by arbitrary exceptions that bubble up to their entry point layers that aren't handled properly or at all. PR #649 Fixes #358 Fixes #354 Ref #310 Ref #346 Ref #375 Ref #599 Ref #641 Resolves #365
| Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main     #310      +/-   ##
==========================================
+ Coverage   79.83%   83.56%   +3.72%     
==========================================
  Files          28       28              
  Lines        4112     4174      +62     
==========================================
+ Hits         3283     3488     +205     
+ Misses        829      686     -143      | 
β What kind of change does this PR introduce?
π What is the related issue number (starting with
#)Fixes #210
β What is the current behavior? (You can also link to an open issue here)
When connections with no content that last for longer than the connection timeout are made to a cheroot server backed by pyOpenSSL, the connection timeout on the server side also causes an unhandled socket error that causes the thread to be dropped. Eventually, the server will extinguish its thread pool and become unresponsive.
See #210 (comment)
β What is the new behavior (if this is a feature change)?
Threads are not dropped when connections with no content that last longer than the connection timeout are made to the server.
π Other information:
π Checklist:
and description in grammatically correct, complete sentences
This change isβ